Here we use pointblank to create a data validation report. In the resulting table at the end, any failing tests should have a CSV button that lets you download a .csv file of just the rows of data that don’t pass that particular validation step.

Dataset Structure: Data types

Action levels: strict criteria (error if any rows fail)

Pointblank Validation
Data Validation

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_expr

Height is measured to nearest cm

col_vals_expr()

ht%%1 == 0

57K 57K
1.00
0
0.00

2
col_vals_expr

Shoots is interger

col_vals_expr()

shts%%1 == 0

57K 57K
1.00
0
0.00

3
col_vals_expr

Number of inflorescences is integer

col_vals_expr()

infl%%1 == 0

2K 2K
1.00
0
0.00

2023-05-25 21:58:27 EDT < 1 s 2023-05-25 21:58:27 EDT

Dataset Structure: Plot & Subplot IDs

Action levels: strict criteria (error if any rows fail)

Pointblank Validation
Data Validation

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_in_set
 col_vals_in_set()

plot_id

CF-1, CF-2, CF-3, CF-4, CF-5, CF-6, FF-1, FF-2, FF-3, FF-4, FF-5, FF-6, FF-7

67K 67K
1.00
0
0.00

2
col_vals_in_set
 col_vals_in_set()

subplot

A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, E1, E2, E3, E4, E5, E6, E7, E8, E9, E10, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, H1, H2, H3, H4, H5, H6, H7, H8, H9, H10, I1, I2, I3, I4, I5, I6, I7, I8, I9, I10, J1, J2, J3, J4, J5, J6, J7, J8, J9, J10

67K 67K
1.00
0
0.00

2023-05-25 21:58:28 EDT < 1 s 2023-05-25 21:58:28 EDT

Dataset Structure: Duplicated or Missing Values

Action levels: error if any rows fail.

Pointblank Validation
Data Validation

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
rows_distinct

duplicated rows

rows_distinct()

67K 67K
1.00
0
0.00

2
col_vals_not_null
 col_vals_not_null()

plant_id

67K 67K
1.00
0
0.00

3
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

3K 3K
1.00
0
0.00

4
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

4K 4K
1.00
0
0.00

5
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

5K 5K
1.00
0
0.00

6
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

7
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

8
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

9
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

10
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

11
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

7K 7K
1.00
0
0.00

12
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

5K 5K
1.00
0
0.00

13
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

14
rows_distinct

Check for duplicate ID's within each year

rows_distinct()

plant_id

6K 6K
1.00
0
0.00

2023-05-25 21:58:28 EDT 2.1 s 2023-05-25 21:58:30 EDT

Plant Characteristics: Range of Size & Inflorescence No.

Action levels: warn if \(\geq\) 1 rows fail conditions, error if \(\geq\) 2% of rows fail conditions.

Pointblank Validation
Data Validation

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_between

shoots between 0 and 20

col_vals_between()

shts

[0, 20]

67K 67K
0.99
8
0.01

2
col_vals_between

height between 0 and 200cm

col_vals_between()

ht

[0, 200]

67K 67K
0.99
2
0.01

3
col_vals_between

infloresences between 0 and 3

col_vals_between()

infl

[0, 3]

67K 67K
0.99
15
0.01

2023-05-25 21:58:31 EDT < 1 s 2023-05-25 21:58:31 EDT

Plant Growth: Change in size from yr(t) to yr(t+1)

Action levels: warn if \(\geq\) 1 rows fail conditions, error if \(\geq\) 2% of rows fail conditions.

Pointblank Validation
Check growth & regression

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_lt

|% change in height| < 200%

col_vals_lt()

ht_pc

2

67K 66K
0.99
420
0.01

2
col_vals_between

|∆ height| < 100cm

col_vals_between()

ht_diff

[−100, 100]

67K 67K
0.99
11
0.01

3
col_vals_between

|∆ shoot number| < 5

col_vals_between()

shts_diff

[−5, 5]

67K 67K
0.99
201
0.01

2023-05-25 21:58:32 EDT < 1 s 2023-05-25 21:58:32 EDT

Seedlings: Initial size

Action levels: warn if \(\geq\) 1 rows fail conditions, error if \(\geq\) 2% of rows fail conditions.

Pointblank Validation
Check seedlings

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
1
col_vals_lt

shoots < 3

col_vals_lt()

shts

3

3K 3K
0.99
12
0.01

2
col_vals_lt

height < 30cm

col_vals_lt()

ht

30

3K 3K
0.99
3
0.01

2023-05-25 21:58:32 EDT < 1 s 2023-05-25 21:58:32 EDT

Check for missing values: Height